-
Notifications
You must be signed in to change notification settings - Fork 3
[Codex] Add handling for Conversational RAG to Validator API #84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -108,3 +112,40 @@ def test_update_scores_based_on_thresholds() -> None: | |||
for metric, expected in expected_is_bad.items(): | |||
assert scores[metric]["is_bad"] is expected | |||
assert all(scores[k]["score"] == raw_scores[k]["score"] for k in raw_scores) | |||
|
|||
|
|||
def test_prompt_tlm_with_message_history() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add test to confirm there is no query rewriting happening, whenever this is the first user message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add test to confirm that the primary TrustworthyRAG.score(prompt, response)
call happens with prompt
reflecting the full chat history, not with prompt
reflecting the rewritten query.
Confirm you are using this TLM utils method:
cleanlab/cleanlab-tlm@a479e32
to turn the chat history into a prompt string.
@@ -108,3 +132,38 @@ def is_bad(metric: str) -> bool: | |||
if is_bad("trustworthiness"): | |||
return "hallucination" | |||
return "other_issues" | |||
|
|||
|
|||
def validate_messages(messages: Optional[list[dict[str, Any]]] = None) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this name validate_messages
should be more carefully chosen when the entire validator
module reserves the name method validate
in Validator for looking at the trustworthiness & Eval scores.
I'd bet we wouldn't change the Validator.validate api, but we could find a different name for validate_messages
since it behaves quite differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider having validate_messages
take messages
as a required (positional argument):
def validate_messages(messages: Optional[list[dict[str, Any]]] = None) -> None: | |
def validate_messages(messages: list[dict[str, Any]]) -> None: |
Everywhere it's being called, it takes in a messages
argument.
The caller already sets a default value for that argument, so I'd advise against setting default values in two function signatures.
@@ -296,6 +318,25 @@ def _remediate(self, *, query: str, metadata: dict[str, Any] | None = None) -> s | |||
codex_answer, _ = self._project.query(question=query, metadata=metadata) | |||
return codex_answer | |||
|
|||
def _maybe_rewrite_query(self, *, query: str, messages: list[dict[str, Any]]) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This _maybe... prefix implies that we might get something different from the method, other than a string. Should the check for self._tlm
be done by the caller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the maybe is supposed to suggest we might re-write the query or not
closed because conversational capability moved to the backend |
No description provided.